Disintegration theorem

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

1 Motivation
2 Statement of the theorem
3 Applications
4 See also
5 References

Motivation

Consider the unit square in the Euclidean plane R², S = [0, 1] × [0, 1]. Consider the probability measure μ defined on S by the restriction of two-dimensional Lebesgue measure λ² to S. That is, the probability of an event E ⊆ S is simply the area of E. We assume E is a measurable subset of S.

Consider a one-dimensional subset of S such as the line segment L_x = {x} × [0, 1]. L_x has μ-measure zero; every subset of L_x is a μ-null set; since the Lebesgue measure space is a complete measure space,

$E \subseteq L_{x} \implies \mu (E) = 0.$

While true, this is somewhat unsatisfying. It would be nice to say that μ "restricted to" L_x is the one-dimensional Lebesgue measure λ¹, rather than the zero measure. The probability of a "two-dimensional" event E could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" E ∩ L_x: more formally, if μ_x denotes one-dimensional Lebesgue measure on L_x, then

$\mu (E) = \int_{[0, 1]} \mu_{x} (E \cap L_{x}) \, \mathrm{d} x$

for any "nice" E ⊆ S. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

Statement of the theorem

(Hereafter, P(X) will denote the collection of Borel probability measures on a metric space (X, d).)

Let Y and X be two Radon spaces (i.e. separable metric spaces on which every probability measure is a Radon measure). Let μ ∈ P(Y), let π : Y → X be a Borel-measurable function, and let ν ∈ P(X) be the pushforward measure ν = π_∗(μ) = μ ∘ π⁻¹. Then there exists a ν-almost everywhere uniquely determined family of probability measures {μ_x}_x∈X ⊆ P(Y) such that

the function $x \mapsto \mu_{x}$ is Borel measurable, in the sense that $x \mapsto \mu_{x} (B)$ is a Borel-measurable function for each Borel-measurable set B ⊆ Y;
μ_x "lives on" the fiber π⁻¹(x): for ν-almost all x ∈ X,

$\mu_{x} \left( Y \setminus \pi^{-1} (x) \right) = 0,$

and so μ_x(E) = μ_x(E ∩ π⁻¹(x));

for every Borel-measurable function f : Y → [0, +∞],

$\int_{Y} f(y) \, \mathrm{d} \mu (y) = \int_{X} \int_{\pi^{-1} (x)} f(y) \, \mathrm{d} \mu_{x} (y) \mathrm{d} \nu (x).$

In particular, for any event E ⊆ Y, taking f to be the indicator function of E,

$\mu (E) = \int_{X} \mu_{x} \left( E \right) \, \mathrm{d} \nu (x).$

^[1]

Applications

Product spaces

The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When Y is written as a Cartesian product Y = X₁ × X₂ and π_i : Y → X_i is the natural projection, then each fibre π₁⁻¹(x₁) can be canonically identified with X₂ and there exists a Borel family of probability measures $\{ \mu_{x_{1}} \}_{x_{1} \in X_{1}}$ in P(X₂) (which is (π₁)_∗(μ)-almost everywhere uniquely determined) such that

$\mu = \int_{X_{1}} \mu_{x_{1}} \, \mu \left(\pi_1^{-1}(\mathrm d x_1) \right)= \int_{X_{1}} \mu_{x_{1}} \, \mathrm{d} (\pi_{1})_{*} (\mu) (x_{1}),$

which is in particular

$\int_{X_1\times X_2} f(x_1,x_2)\, \mu(\mathrm d x_1,\mathrm d x_2) = \int_{X_1}\left( \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1) \right) \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right)$

and

$\mu(A \times B) = \int_A \mu\left(B|x_1\right) \, \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right).$

The relation to conditional expectation is given by the identities

$\operatorname E(f|\pi_1)(x_1)= \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1),$

$\mu(A\times B|\pi_1)(x_1)= 1_A(x_1) \cdot \mu(B| x_1).$

Vector calculus

The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface Σ ⊂ R³, it is implicit that the "correct" measure on Σ is the disintegration of three-dimensional Lebesgue measure λ³ on Σ, and that the disintegration of this measure on ∂Σ is the same as the disintegration of λ³ on ∂Σ. ^[2]

Conditional distributions

The disintegration theorem can be applied to give a rigorous treatment of conditioning probability distributions in statistics, while avoiding purely abstract formulations of conditional probability. ^[3]

References

^ Dellacherie, C. & Meyer, P.-A. (1978). Probabilities and potential. North-Holland Mathematics Studies, North-Holland Publishing Co., Amsterdam.
^ Ambrosio, L., Gigli, N. & Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 3-7643-2428-7.
^ Chang, J.T.; Pollard, D. (1997). "Conditioning as disintegration". STATISTICA NEERLANDICA 51 (3). http://www.stat.yale.edu/~jtc5/papers/ConditioningAsDisintegration.pdf.